home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Software Vault: The Gold Collection
/
Software Vault - The Gold Collection (American Databankers) (1993).ISO
/
cdr47
/
pctuto.zip
/
DISK4.EXE
/
lha
/
CHAP23.DOC
< prev
next >
Wrap
Text File
|
1990-07-31
|
28KB
|
657 lines
253
CHAPTER 23 - XLAT
The 800 pound gorilla in the computer field is, of course, IBM.
It can go its own way and other companies have to adjust to keep
themselves in line with what IBM is doing.
You have been using ASCII characters since the first time you
used BASIC (or whatever your first high-level language was).
Every character has a unique number which represents it.
character ASCII encoding
A 65d
a 97d
? 63d
0 48d
IBM has its own encoding for mainframe computers. It is called
EBCDIC (pronounced ebb'-sih-dick).{1} It is a spinoff of the
coding on punch cards. You remember punch cards? This coding is
entirely different from ASCII. Here are some examples.
character ASCII code EBCDIC code
a 97d 129d
? 63d 111d
0 48d 240d
H 72d 200d
I 73d 201d
J 74d 209d
K 75d 210d
You can see that there is no relationship between the two
encodings. Also, notice that while the alphabet is a continuous
section of ASCII coding, there are breaks in the EBCDIC code
(I=201, J=209).
All PCs use ASCII, so if we want to transfer text from a PC to an
IBM mainframe computer, we need to change ASCII -> EBCDIC going
to the mainframe and change EBCDIC -> ASCII coming from the
mainframe. This is the responsibility of the communications
program that runs the modem, so you will never have to do it
yourself. Intel has provided an instruction to help the
communications program do this translation. It is called XLAT.
In order to use XLAT, you need a translation table. This is a 256
byte array where each element of the array contains the result
you want. Looking at the data above:
____________________
1. Which stands for Extended Binary Coded Decimal Interchange
Code.
______________________
The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson
The PC Assembler Tutor 254
______________________
CHARACTER ASCII TO EBCDIC TABLE EBCDIC TO ASCII TABLE
a array1 [97] = 129 array2 [129] = 97
? array1 [63] = 111 array2 [111] = 63
0 array1 [48] = 240 array2 [240] = 48
H array1 [72] = 200 array2 [200] = 72
I array1 [73] = 201 array2 [201] = 73
J array1 [74] = 209 array2 [209] = 74
K array1 [75] = 210 array2 [210] = 75
We have two different tables here. Array1 takes the ASCII
encoding and gives back the EBCDIC encoding. Array2 takes the
EBCDIC encoding and gives back the ASCII encoding. For each
character, the appropriate table gives the correct translation
from one encoding to another. All we need now is the translation
instruction. Put the address of the translation table in BX. This
table should be in the DS segment, but DS may be overriden:
mov bx, offset ascii_to_ebcdic_table
Put the character you want translated in al:
mov al, character
translate:
xlat
To translate a 20 byte string of ASCII data into EBCDIC, you
might have the following code:
;----------
mov di, offset ebcdic_string
mov ax, seg ebcdic_string
mov es, ax
mov si, offset ascii_string
mov bx, offset ascii_to_ebcdic_table
mov cx, 20 ; translate 20 bytes
cld ; clear DF (increment)
translation_loop:
lodsb ; ascii to al
xlat ; translate
stosb ; al to ebcdic
loop translation_loop
; ----------
Since this is ASCII to EBCDIC, if AL contained 63 before XLAT,
then after XLAT AL would contain 111. If AL contained 73 before
XLAT, then after XLAT it would contain 201. If AL contained 97
before XLAT, after XLAT it would contain 129.
If we wanted to go the other direction we would have to make the
EBCDIC string the source string, make the ASCII string the
Chapter 23 - Xlat 255
_________________
destination string, and use the other table:
mov bx, offset ebcdic_to_ascii_table
The rest of the code would be the same.
Since this is done by the communications program, we won't
concern ourselves with ASCII <-> EBCDIC any more, but we will use
XLAT in two slightly different ways.
First, let's categorize characters. Some things are Whitespace
(that is, tabs, newlines, spaces, form feeds, etc.) Some
characters are octal, decimal, punctuation, hex, etc. There is a
pre-existing table called translation_table in the subdirectory
XTRAFILE. Its pathname is \xtrafile\transtbl.obj. It has all 256
ascii characters coded in the following way:
WHITESPACE EQU 80h ; 1000 0000
PUNCTUATION EQU 40h ; 0100 0000
ALPHABETIC EQU 20h ; 0010 0000
OCTAL EQU 10h ; 0001 0000
DECIMAL EQU 08h ; 0000 1000
HEX EQU 04h ; 0000 0100
BOX_CHAR EQU 02h ; 0000 0010
GREEK_CHAR EQU 01h ; 0000 0001
If the character is whitespace, then the leftmost bit is set. If
it is a greek character (ascii 224 - 239 on the PC) then the
rightmost bit is set. If it is more than one thing, then the
appropriate bits are set. For instance, '6' is octal, decimal and
hex, so it's encoding is:
'6' 0001 1100
'a' is both alphabetic and hex, so it's encoding is:
'a' 0010 0100
The following program inputs a character, and finds out whether
it is punctuation, a letter, etc. If it is none of the eight
things, then the program prints that nothing was found. It is the
same block of code over and over, so you might want to do only
part, or you might want to cut it out with a word processor and
insert it in the template file (don't forget to delete the page
headers and page numbers).
; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
EXTRN translation_table:BYTE ;\xtrafile\transtbl.obj
whitespace_banner db "It is whitespace." , 0
punctuation_banner db "It is punctuation." , 0
alphabet_banner db "It is alphabetic." , 0
octal_banner db "It is octal." , 0
decimal_banner db "It is decimal." , 0
The PC Assembler Tutor 256
______________________
hex_banner db "It is hex." , 0
drawing_banner db "It is a box drawing character." , 0
greek_banner db "It is a Greek character." , 0
nothing_banner db "No match was found." , 0
dirty_flag db ?
; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
WHITESPACE EQU 80h
PUNCTUATION EQU 40h
ALPHABETIC EQU 20h
OCTAL EQU 10h
DECIMAL EQU 08h
HEX EQU 04h
BOX_CHAR EQU 02h
GREEK_CHAR EQU 01h
; set up the xlat table
mov ax, seg translation_table
mov es, ax
mov bx, offset translation_table
outer_loop:
mov dirty_flag, 0 ; marker for success
call get_ascii_byte ; input a byte to al
xlat es:[bx] ; do the translation
test al, WHITESPACE
jz punct_check
push ax ; save translation in al
mov ax, offset whitespace_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
punct_check:
test al, PUNCTUATION
jz alpha_check
push ax ; save translation in al
mov ax, offset punctuation_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
alpha_check:
test al, ALPHABETIC
jz octal_check
push ax ; save translation in al
mov ax, offset alphabet_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
octal_check:
test al, OCTAL
jz decimal_check
Chapter 23 - Xlat 257
_________________
push ax ; save translation in al
mov ax, offset octal_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
decimal_check:
test al, DECIMAL
jz hex_check
push ax ; save translation in al
mov ax, offset decimal_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
hex_check:
test al, HEX
jz drawing_check
push ax ; save translation in al
mov ax, offset hex_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
drawing_check:
test al, BOX_CHAR
jz greek_check
push ax ; save translation in al
mov ax, offset drawing_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
greek_check:
test al, GREEK_CHAR
jz nothing_check
push ax ; save translation in al
mov ax, offset greek_banner
call print_string
pop ax
mov dirty_flag, 1 ; set the dirty flag
nothing_check:
cmp dirty_flag, 0 ; was anything found?
je print_nothing_banner
jmp outer_loop
print_nothing_banner:
mov ax, offset nothing_banner
call print_string
jmp outer_loop
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
you need to:
link prog1+transtbl+\asmhelp ;
The PC Assembler Tutor 258
______________________
The program is long, but straightforward. Input a character and
get its encoding. Test for each characteristic. If it is found,
print the appropriate message and set the dirty_flag to indicate
something was printed. At the end, if nothing was printed, print
the failure message.
Notice that the translation table is in ES and we are using a
segment override for it. If you look at the EXTRN statement for
'translation_table', you will see that even though we are using
ES, it is declared EXTRN in a segment with an:
ASSUME ds:DATASTUFF
statement. How can we get away with this? The assembler never
deals with 'translation table' directly. The only thing it does
is put the offset in BX. We put the segment override in ourselves
with:
xlat es:[bx]
so the assembler never has to decide whether a segment override
is necessary or which segment override to use.
WORD SEARCH
When doing the mock word search program in the chapter on string
instructions, I mentioned that it really wouldn't cut the mustard
when it comes to real word searches. Why? If we are looking for
"when" we also want to find "When". If we are looking for
" searches ", we also want to find " searches,", that is,
punctuation should not interefere unless we want it to, and
capitals should not interefere unless we want them to. With the
aid of a translation table, we will make a word search program
which uses the following rules. In the SEARCH string (the string
that defines what you are looking for):
(1) Any small letter will match either a small or large
letter.
(2) A capital letter will match only a capital letter.
(3) A blank will match any whitespace or punctuation.
(4) A punctuation mark will only match itself.
With these rules "Why" must start with a capital 'W' to be a
match, but 'h' and 'y' may be either capital or small. " some,"
may have any whitespace (including a carriage return) in front,
but must hava a comma ',' at the end.
This program has two data files. \XTRAFILE\SRCHTBL.OBJ contains
the translation table. It is called "wordsearch_table" and is in
DATASTUFF, so will be in our normal DS segment. In order to have
text to search I have included an object file that is the text of
a chapter from a book. (The object file text includes carriage
returns). The text is a C string - it is terminated by a 0.
The book was written by C.D. Huffam, and is the autobiographical
account of his dual life as a writer and lecturer. The book is
Chapter 23 - Xlat 259
_________________
called "A Tale of Two C.D.s". The object file with the text is
\XTRAFILE\TWOTALE.OBJ. It is in a private segment and will use ES
as a segment register. There is also a straight text file which
you can print out so you can see what is in the object file. It
is \XTRAFILE\TWOTALE.DOC.
Here's the program. The explaination is at the end.
; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
EXTRN tale_text:BYTE, wordsearch_table:BYTE
entry_message db 13,10, "Enter a word for a word search", 0
no_match_message db "There was no match", 0
input_buffer db 80 dup (?)
text_file_length dw ?
letter_count dw ?
; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
; find the length of the text file
mov ax, seg tale_text ; load es register
mov es, ax
mov di, offset tale_text ; offset to di
mov bx, di ; copy to bx
mov al, 0 ; try to match zero
cld ; clear DF (increment)
string_end_loop:
scasb ; search for zero
jne string_end_loop
dec di ; one too many , so decrement
sub di, bx ; finish - start = length
mov text_file_length, di ; length of text_file
big_loop:
; get a word for the word search
mov ax, offset entry_message
call print_string
mov ax, offset input_buffer
call get_string
; find the end of string
mov al, 0 ; compare with 0
mov bx, offset input_buffer
mov cx, 0 ; letter count
letter_count_loop:
cmp al, [bx] ; compare to 0
je end_of_count_loop
inc cx ; increment count
inc bx ; increment pointer
jmp letter_count_loop
end_of_count_loop:
cmp cx, 0 ; if 0, string is empty
The PC Assembler Tutor 260
______________________
je big_loop ; so start again
mov letter_count, cx
; look for word match. In this program, the text string
; is referenced by si and the search string is referenced
; by di.
mov si, offset tale_text
mov cx, text_file_length ; length of file
sub cx, letter_count ; last possible match
inc cx ; +1 for boundary condition
; set up translation table ( it is in DATASTUFF )
mov bx, offset wordsearch_table
word_search_loop:
push si ; save a copy
push cx ; save a copy
mov di, offset input_buffer
mov cx, letter_count
letter_loop:
mov al, es:[si] ; text to al
cmp al, [di] ; same as search string?
je next_letter
xlat ; if not, translate
cmp al, [di] ; allowable substitute?
jne new_start ; if not, start at new place
next_letter:
inc di ; move to next letter
inc si
loop letter_loop
; we fell through, so we found a complete match
jmp found_it
; no match. are we finished?
new_start:
pop cx
pop si
inc si ; move to next character
loop word_search_loop
; we fell through. finished, but no match
mov ax, offset no_match_message
call print_string
jmp big_loop
found_it:
pop cx ; take cx off the stack
pop si ; start of the match
; move 25 characters to buffer for printing
mov di, offset input_buffer
mov cx, 25
Chapter 23 - Xlat 261
_________________
character_move:
mov al, es:[si]
mov [di], al
inc si ; increment pointers
inc di
loop character_move
mov BYTE PTR [di], 0 ; end of string
mov ax, offset input_buffer
call print_string
jmp big_loop
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
You need to:
link prog2+twotale+srchtbl+\asmhelp ;
to get asmhelp and the two data files in the program.
This program is very similar to the search program in the chapter
on strings. However, because of where the files are, the pointers
have been changed around. Therefore, it is safer if you simply
cut out the program with a word processor and paste it into the
template file rather than try to modify the prevoius search
program.{2}
It is assumed that you did the string match program. The logic is
the same and will not be covered again. First we input a search
string. Then starting at the beginning of the text to be search
we check till we find the first match. If we find a match, we
print out 25 characters starting with the first character of the
match. If no match is found, a message to that effect is printed.
The character match is a two step process. The character from the
text is put in AL. It is compared with the search character for
an EXACT match. If they match, we are done. If not, we use XLAT
on AL (the character from the text) which will translate to its
allowable substitute. In fact, all this is just: (1) all capital
letters become small, (2) all punctuation becomes spaces, and (3)
all whitespace becomes spaces. Once again, we compare AL with the
search character. If we have a match, ok. If not, we start over.
The text is in ES, the translation table is in DS, so it is
inconvenient to use the string instructions in this program.
Try to match a word at the beginning of the line, end of the
line, with and without punctuation and with and without capitals.
If you go across a line break, you need to substitute two blanks
in the search string for CRLF (13,10).
____________________
2. You should understand what is going on in the code before
you run these programs. I didn't write the code for myself, I
wrote it for you. If you run it but don't understand it, it won't
help you a bit.
The PC Assembler Tutor 262
______________________
Suppose you are not interested in all 256 values of the
translation table. Let's say that you only want to have a
translation table for the numbers from 0 to 99. Can you still use
this? Yes, but you need to put in some range checking to make
sure that you have valid data.
MAX_VALUE EQU 99
mov al, data_byte ; byte to al
cmp al, MAX_VALUE ; too large?
ja data_error ; report error
xlat
This insures that any data that is out of range is not
translated. Therefore the translation table only needs to be 100
bytes long (0 - 99).
If you want more than 256 elements in the translation table you
need to use words, not bytes, and you cannot use XLAT. You can
make your own code to do the same thing.
MAX_VALUE EQU 999
my_translation_table dw 1000 dup (?)
if you put the translation data into the table, you can then have
the following code:
mov bx, offset my_translation_table
; - - - - - translation block
mov si, data_word ; word to si
cmp si, MAX_VALUE ; too large?
ja data_error
shl si, 1 ; SI x 2 = number of BYTES into table
mov ax, [bx+si] ; base + offset
; - - - - - end of translation block
XLAT is about twice as fast as this last code, so when you have a
choice always use XLAT.
Chapter 23 - Xlat 263
_________________
SUMMARY
XLAT
BX holds the address of a 256 byte array called a
translation table. AL holds the character to be translated.
If x is the value in AL before XLAT, then after XLAT,
AL=array[x].